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ABSTRACT 

For a variety of regularization methods, algorithms comput- 
ing the entire solution path have been developed recently. 
Solution path algorithms do not only compute the solution 
for one particular value of the regularization parameter but 
the entire path of solutions, making the selection of an opti- 
mal parameter much easier. It has been assumed that these 
piecewise linear solution paths have only linear complexity, 
i.e. linearly many bends. We prove that for the support 
vector machine this complexity can indeed be exponential 
in the number of training points in the worst case. 
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1. INTRODUCTION 

Support vector machines (SVM) and related kernel methods 
have been applied successfully in many optimization, clas- 
sification and regression tasks in a variety of areas as for 
example signal processing, statistics, biology, surface recon- 
struction and data mining. 

These regularization methods have in common that they are 
convex, usually quadratic, optimization problems containing 
a special parameter in their objective function, called the 
regularization parameter, representing the tradeoff between 
small model complexity (regularization term) and good ac- 
curacy on the training data (loss term), or in other words 
the tradeoff between a good generalization performance and 
overfitting. In particular the C- and Z/-SVM versions with 
both i\- and fe-loss [5j [7], support vector regression [23] , 
the LASSO [24], the one class SVM [22] li -regularized least 
squares [l5] , and compressed sensing |10] are all instances of 
parameterized quadratic programs (pQPs) of the form 



QP(aO 



minimize^ 
subject to 



x T Qx + c(fi) T x 
Ax > b{n) 
x > 0, 



(1) 



where c : R -> R n and b : R -> R m are functions that 
describe how the linear objective function c and the right- 
hand side b vary with some real parameter \i. Q is an n x n 
symmetric positive semidefinite (PSD) matrix, usually the 
kernel matrix, c is an n- vector (the linear objective func- 
tion), A is an m x n matrix (the constraint matrix), and b 
is an m- vector (the right-hand side). 



The task of solving such a problem for all possible values of 
the parameter \i is called parametric quadratic programming. 
What we want as output is a solution path, an explicit func- 
tion x* : R — > R n that describes the solution as a function 
of the parameter fi. It is well known that the solution x* 
is piecewise linear in the parameter \i if c and b are linear 
functions in /x, see for example [19]. 



Solution path algorithms. An algorithm to compute the 
entire solution path of the C-SVM has originally been re- 
ported by Hastie et al. [14] . [9] gave such an algorithm for 
the LASSO, and later [17] and [l6] proposed solution path 
algorithms for Z/-SVM and one-class SVM respectively. Also 
Receiver Operating Characteristic (ROC) curves of SVM 
were recently solved by such methods [3]. Support vector 
regression (SVR) is interesting as its underlying quadratic 
program depends on two parameters, a regularization pa- 
rameter (for which the solution path was tracked by [13[ 
[27] [17]) and a tube- width parameter (for which [25] recently 
gave a solution path algorithm). 

Generic solution algorithms for parametric quadratic pro- 
gramming of the form |l]) , such as [20[ 1 18] and recently [2] , 
can be applied to the above mentioned applications, instead 
of using different algorithm descriptions for each variant; 
Compared to the above mentioned approaches, these generic 
algorithms also have the advantage that they are able to deal 
with arbitrary kernel matrices, which do not necessarily have 
to be invert ible. 



Complexity of solution paths. Based on empirical obser- 
vations, Hastie et al. [14] conjecture that the complexity of 
the solution path of the two class SVM, i.e., the number of 
bends, is linear in the number of training points. This con- 
jecture was repeatedly stated for the related models in [l4] 
T3J [3j [26j [21] [28} [29] [25]. Here we disprove the conjecture 
by showing that the complexity in the SVM case can indeed 
be exponential in the number of training points. 



Support Vector Machines. The SVM is a well studied 
standard tool for classification problems. The primal Z/-SVM 
problem 7 is the following pQP (the related C-SVM and 



its dual are pQPs of very similar form): 

minimize™, P)b} £ \\\w\\ 2 - vp + \ Yh=i & 
subject to yi(uj T Xi + b) > p — & f0 , 

x > 0, [Z) 

P>0, 

where yi G {±1} is the class label of data point xi and 
v is the regularization parameter. The dual of the the v- 
SVM, for p := is the following pQP (observe that the 
regularization parameter moves from the objective function 
to the constraints): 

minimize^ cuajyiyjxj Xj 

subject to £\ , 1 ai = 1 , . 

< OLi < p 

2. COMPLEXITY OF THE SVM SOLUTION 
PATH 

We will now give an example of a two-class SVM instance, 
where the solution path has exponential complexity (i.e. ex- 
ponentially many bends) in the number of training samples, 
for the case where no kernel is used. 



For this, we align a large number n+ of points of the one 
class on a circle segment, and align the other class of just 
two vertices below it, as depicted in Figure [2] 

As p decreases from 1 down to ~, the "left" end of the op- 
timal distance vector, which is a multiple of the optimal 
uj(p), walks through nearly all of the boundary faces of the 
blue class. More precisely, the path of the optimal u;(/i), for 
1 > p > \ , makes at least twice the number of "inner" blue 
vertices many bends, which proves the claim. 

2.2 The High-Dimensional Case 

The Goldfarb cube. It is known that the 2d facets of the 
ordinary unit cube in R d can be perturbed slightly such that 
when we project the resulting poly tope onto the last two 
coordinates, every vertex will be visible in the "shadow". 
We will denote the two dimensional plane spanned by the 
last two coordinate vectors by S. This perturbed version 
of the cube is called the Goldfarb cube and already served 
as an example on which the Simplex algorithm needs an 
exponential number of steps to find the optimal solution to 
a linear program [12]. 



To avoid confusion our example does not just show that 
some particular algorithm needs exponentially many steps 
to compute the solution path (as for example the simplex 
algorithm in linear programming), but indeed shows that 
any algorithm reporting the solution path will need expo- 
nential time, because the path in our example is unique and 
has exponentially many bends. 



Geometric interpretation of the two-class SVM. The dual 



([3]) of the zy-SVM, for p = is exactly the polytope dis- 
tance problem between the reduced convex hulls of the two 
classes or formally 

dist (conv^ ({xi | yi = +1}) , conv^ ({xi | yi = -1})) , 

where 

conv M (P) := < a pP < «p < ^ a p = 1 > 
KpeP pep ) 

is the reduced convex hull of a set of points, for a given 
parameter p, < p < 1. 

We have choosen to present the Z/-SVM (instead of the C- 
SVM), because its regularization parameter v is straight- 
forward to interpret geometrically as described above. How- 
ever, this geometric interpretation also holds for the C-SVM 
as shown by G] , and the correspondence \6\ between the two 
versions implies that our following lower bounds for the so- 
lution path complexity do also hold for the C-SVM case. 

2.1 A First Example in Two Dimensions 

Hastie et al. [14] conjectured that the number of bends in the 
solution path of a two class SVM is at most k min(n+, n_), 
where k is some number in the range between 4 and 6 and 
n+ and ri- are the sizes of the two classes. First we give 
an example for an input to the SVM for which the solution 
path has at least 2(max(n+, n_) —3) many bends, where n+ 
and ri- are the sizes of the two point classes. 



The dual of the Goldfarb cube. The dual of a polytope 
P, or polar in terms of Ziegler [30] , is defined as P* = 
{y eR d I x T y <1 VxG P}. In the case that P contains 
the origin, this is equivalent to 



-{ 



ye: 



v T y < 1 



V(P)} 



and this representation is minimal in the number of con- 
straints. By V(P) we denote the vertices of P, see also [30] 
Theorem 2.11]. 

The dual polytope of the cube is the cross-polytope, having 
linearly many vertices (2d to be precise) and exponentially 
many facets (2 d of them). The dual of the Goldfarb cube is 
thus a perturbed version of the cross-polytope, also having 
only 2d vertices but 2 d facets. We initially shift the Goldfarb 
cube such that the origin lies in its interior. 

We now want to translate the "shadow" property of the Gold- 
farb cube P to its dual polytope D. By looking at the dual 
constraint v T y < 1 for each vertex v £ V(P), it is immedi- 
ately clear that the dual of the projection of any polytope P 
(which contains the origin) onto S is exactly the intersection 
of D, the dual of P, with S. In other words both represen- 
tations coincide if we restrict ourself just to the two last 
coordinates. So in our case, the fact that each vertex v% of 
the Goldfarb cube, 1 < % < 2 d , is visible in the 2-dimensional 
shadow onto S implies that also the intersection of D with 
S has exactly that many boundary segments or facets, and 
the same number of vertices, 2 d each. 

Since there are finitely many vertices, and since no vertex of 
D is lying on any coordinate plane, it is also clear that we 
can extend the plane S to a thin slab 



U 



x G 



1<3< 



of thickness 2e, for some e > 0, such that the intersection 
of D's boundary with this slab still does not contain any 



Figure 1: Two dimensional example of an SVM path with at least max(n+,n_) many bends. The green lines 
indicate the optimal solutions to the polytope distance problem for the indicated parameter value of fi. 




b 



Figure 2: The 3-dimensional cross-polytope D. If 
you imagine the vertices a and c lying just slightly 
behind the intersection plane and the vertices b 
and d just slightly in front of the S, then the the 
plane S intersects all 2 3 = 8 triangular facets. 



vertices. This implies that inside the slab, the combinatorial 
type of the polytope D is the same as on the plane S, thus 
any path running around the polytope on its boundary, and 
staying this slab, runs through the interior of all of the 2 d 
facets. 

Now as we have denned the larger one of our two point 
classes, to be exactly the described polytope D, which we 
will use as a replacement of the circle segment in the above 
2-dimensional example. Again we let the second polytope 
Q consist of just a line spanned by two vertices, living in S 
as in the above example, when we think of S being the 2- 
dimensional plane which houses Figure [2] By stretchin^Jhe 
polytope D away from the plane S, it is easy to achieve that 
for every point q £ Q, the closest point of D to our point q 
will lie in the slab U. 

Having this construction, it again follows directly that when 
we decrease the regularization parameter fi in the SVM, from 
1 down to |, the solution will pass through at least \ of 
the at least 2 d facets of the reduced hull conv /x (V A (D)) and 
thus the path will have at least that many bends, which is 
exponential in max(n+,n_), by our choice of n+ = 2d. 

1 ln other words: We scale up all coordinates of the poly- 
tope vertices, except the 2 coordinates which define our fixed 
plane S. 



3. EXPERIMENTS 

We have implemented the above Goldfarb cube construc- 
tion using exact arithmetic, and could confirm the theoret- 
ical findings. We constructed the "stretched" dual of the 
Goldfarb cube using Polymake [ll], see Figure [3] for a visu- 
alization of its intersection with the two dimensional plane 
S. Having the exact constraint formulation of the poly- 
tope, we then used the exact (rational arithmetic) quadratic 
programming solver of CGAL In to calculate the optimal 
distances for different discrete values of \i. For d < 8, in 
all cases we obtained significantly more than \2~^r bends in 
the path (we only counted a bend when the set of support 
vectors strictly changed when going from one \i value to the 
next). 




Figure 3: Example for d = 8: The perturbed cross 
polytope of 16 vertices intersected with the two 
dimensional plane has 256 "bends". Used com- 
mand sequence in Polymake: Goldfarb gfarb.poly 8 
1/3 1/12; center gcenter.poly gfarb.poly; polarize 
gpolar.poly gcenter.poly; intersection gint.poly 
gpolar.poly plane. poly; polymake gint.poly. 
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